SMART-INT: A SYSTEM FOR ANSWERING QUERIES OVER WEB DATABASES USING ATTRIBUTE DEPENDENCIES by
نویسندگان
چکیده
Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, it is required to integrate the information about the individual entities that are fragmented over multiple tables. At first blush this is just the inverse of a traditional database normalization problem the universal relation is to be reconstructed from the given tables (sources). However, the tables maybe missing Primary Key Foreign Key relations, which leads to technical challenges. It is clear that reconstruction and retrieval of relevant entities will have to involve joining the tables. While tables do share attributes, direct joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. This work is aimed at addressing the problem of data integration in such scenarios. To make-up for the missing primary key-foreign key relations, this approach mines and uses attribute dependencies. Such dependencies can both be intra-table and inter-table. Given a query, these dependencies are used to piece together a tree of relevant tables and schemes for joining them. For experimental evaluation, a master table was fragmented horizontally and vertically to generate a set of overlapping tables. A set of randomly generated queries were targeted at both the master tables and the fragmented tables. The result from the master tables acted as the ground truth, against which the result from this approach was evaluated, in terms of precision and recall. Approaches like answering from a single table, and direct joins were employed as benchmarks. The result tuples produced by this approach are demonstrated to be able to strike a favorable balance between precision and recall.
منابع مشابه
On the finite controllability of conjunctive query answering in databases under open-world assumption
In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are inclusion dependencies and functional dependencies, in particular key dependencies; the query languages we consider are conjunctive queries ...
متن کاملOntograte: towards Automatic Integration for Relational Databases and the Semantic Web through an Ontology-Based Framework
Integrating existing relational databases with ontology-based systems is among the important research problems for the Semantic Web. We have designed a comprehensive framework called OntoGrate which combines a highly automatic mapping system, a logic inference engine, and several syntax wrappers that inter-operate with consistent semantics to answer ontology-based queries using the data from he...
متن کاملSupporting Queries with Imprecise Constraints
In this paper, we motivate the need for and challenges involved in supporting imprecise queries over Web databases. Then we briefly explain our solution, AIMQ a domain independent approach for answering imprecise queries that automatically learns query relaxation order by using approximate functional dependencies. We also describe our approach for learning similarity between values of categoric...
متن کاملIQPI: An Incremental System for Answering Imprecise Queries Using Approximate Dependencies and Concept Similarities
Most of the proposed systems to process queries over web databases require the user to provide some information regarding the relative importance of attributes and the similarities between nominal values. Recently, a new system called AIMQ has been proposed, which is based on measuring concept similarities. This system is end-user independent and can answer imprecise queries. The main drawback ...
متن کاملAnwering Recursive Queries under Keys and Foreign Keys is Undecidable
Query answering in the presence of integrity constraints is a fundamental problem in several settings, such as information integration. Keys, foreign keys and inclusion dependencies are the most common forms of constraints used in databases. It has been established recently that, in the presence of such constraints, query answering is decidable for non-recursive queries. Obviously, in the absen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009